Microbiome Sample metadata
What is metadata?
- Metadata is a set of data that describes and provides information about other data. It is commonly defined as data about data.
- Sample metadata described here refers to the description and context of the individual sample collected for a specific microbiome study.
Metadata structure
- Metadata collected at different stages are typically organized in an
Excel or Google spreadsheet where:
- The metadata table columns represent the properties of the samples.
- The table rows contain information associated with the samples.
- Typically, the first column of sample metadata is Sample ID, which designates the key associated to individual sample
- Sampl ID must be unique.
Embedded metadata
- In most cases, you will find the metadata detached from the experimental data.
- Embedded metadata integrates the experimental data especially for graphics.
- Major microbiome analysis platforms require sample metadata, commonly referred to as mapping file when performing downstream analysis.
Downloading NCBI-SRA metadata
For demo: We will explore more on sample metadata retrieved from four randomly selected microbiome BioProjects, including:
- PRJNA477349: 16S: rRNA from bushmeat samples collected from Tanzania Metagenome
- PRJNA802976: 16S: Changes to Gut Microbiota following Systemic Antibiotic Administration in Infants
- PRJNA322554: 16S: The Early Infant Gut Microbiome Varies In Association with a Maternal High-fat Diet
- PRJNA937707: 16S: Exploring methods for manipulating both the composition and genomic content of bacteria in the mouse gut
Manually via SRA Run Selector
Sample metadata available in SRA database can be retrieved manually
via the SRA Run Selector.
- Note that the SRA filename for metadata is automatically named SraRunTable.txt.
- Users have the option to change the default TXT extension to like CSV.
- In our demo we will use CSV save the metadata file in
data/metdata/folder.
Example screen shot of SRA Run Selector for metadata associated with the NCBI-SRA bioproject number PRJNA477349
Computationally via Entrez Direct scripts
#!/bin/bash
esearch -db sra -query 'PRJNA477349[bioproject]' | efetch -format runinfo >data/metadata/runinfo_PRJNA477349_metadata.csv;
esearch -db sra -query 'PRJNA802976[bioproject]' | efetch -format runinfo >data/metadata/runinfo_PRJNA802976_metadata.csv;
esearch -db sra -query 'PRJNA322554[bioproject]' | efetch -format runinfo >data/metadata/runinfo_PRJNA322554_metadata.csv;
esearch -db sra -query 'PRJNA937707[bioproject]' | efetch -format runinfo >data/metadata/runinfo_PRJNA937707_metadata.csv;
Computationally using pysradb function
#!/bin/bash
pysradb metadata --saveto data/metadata/pysradb_PRJNA477349_metadata.csv --detailed srp_id PRJNA477349
pysradb metadata --saveto data/metadata/pysradb_PRJNA802976_metadata.csv --detailed srp_id PRJNA802976
pysradb metadata --saveto data/metadata/pysradb_PRJNA322554_metadata.csv --detailed srp_id PRJNA322554
pysradb metadata --saveto data/metadata/pysradb_PRJNA937707_metadata.csv --detailed srp_id PRJNA937707
Explore microbiome sample metadata
Variable frequencies example
Sampling locations
References
[1]
Buza, T. M., Tonui, T., Stomeo, F., Tiambo, C.,
Katani, R., Schilling, M., … Kapur, V. (2019). iMAP: An integrated
bioinformatics and visualization pipeline for microbiome data analysis.
BMC Bioinformatics, 20. https://doi.org/10.1186/S12859-019-2965-4
Appendix
Project main tree
.
├── LICENSE
├── README.md
├── Rplots.pdf
├── config
│  ├── config.yaml
│  ├── samples.tsv
│  └── units.tsv
├── dags
│  ├── rulegraph.png
│  └── rulegraph.svg
├── data
│  └── metadata
├── images
│  ├── PRJNA477349_variable_freq.png
│  ├── PRJNA477349_variable_freq.svg
│  ├── bkgd.png
│  ├── geeks.png
│  ├── gpsfiles
│  ├── metadata.png
│  ├── sample_gps.png
│  ├── smkreport
│  └── sra_run_selector.png
├── imap-sample-metadata.Rproj
├── index.Rmd
├── library
│  ├── apa.csl
│  ├── imap.bib
│  └── references.bib
├── report.html
├── resources
├── results
│  ├── PRJNA322554_read_size_asc.csv
│  ├── PRJNA322554_read_size_desc.csv
│  ├── PRJNA322554_sra_accessions.txt
│  ├── PRJNA322554_srarun_accessions.txt
│  ├── PRJNA477349_read_size_asc.csv
│  ├── PRJNA477349_read_size_desc.csv
│  ├── PRJNA477349_sra_accessions.txt
│  ├── PRJNA477349_srarun_accessions.txt
│  ├── PRJNA589182_read_size_asc.csv
│  ├── PRJNA589182_read_size_desc.csv
│  ├── PRJNA589182_sra_accessions.txt
│  ├── PRJNA589182_srarun_accessions.txt
│  ├── PRJNA802976_read_size_asc.csv
│  ├── PRJNA802976_read_size_desc.csv
│  ├── PRJNA802976_sra_accessions.txt
│  ├── PRJNA802976_srarun_accessions.txt
│  ├── PRJNA937707_read_size_asc.csv
│  ├── PRJNA937707_read_size_desc.csv
│  ├── PRJNA937707_sra_accessions.txt
│  ├── PRJNA937707_srarun_accessions.txt
│  ├── project_tree.txt
│  └── sample_location.csv
├── styles.css
└── workflow
├── Snakefile
├── envs
├── reports
├── rules
├── schemas
└── scripts
16 directories, 45 files
Screenshot of interactive snakemake report
The interactive snakemake HTML report can be viewed by opening the
report.htmlusing any compatible browser. You will be able to explore the workflow and the associated statistics. You can close the left bar to get a more expansive display view.
Troubleshooting of FAQs
- Question
- Question
-
Answer
-
Answer